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State Amplification 

Young-Han Kim, Arak Sutivong, and Thomas M. Cover 



Abstract — We consider the problem of transmitting data at rate R over 
a state dependent channel p{y\x, s) with state information available at the 
sender and at the same time conveying the information about the channel 
state itself to the receiver. The amount of state information that can be 
learned at the receiver is captured by the mutual information /(S"; Y") 
between the state sequence S" and the channel output Y". The optimal 
tradeoff is characterized between the information transmission rate R 
and the state uncertainty reduction rate A, when the state information is 
either causally or noncausally available at the sender. In particular, when 
state transmission is the only goal, the maximum uncertainty reduction 
rate is given by A* = maxpf^i^) I{X, S; Y). This result is closely related 
and in a sense dual to a recent study by Merhav and Shamai, which solves 
the problem of masking the state information from the receiver rather 
than conveying it. 



I. Introduction 

A channel p{y\x, s) with noncausal state information at the sender 
has capacity 

C= max {I{U;Y)- I{U;S)) (1) 

p{ii, X I s) 

as shown by Gelfand and Pinsker 1131 . Transmitting at capacity, 
however, obscures the state information S" as received by the receiver 
Y". In some instances we wish to convey the state information 
itself, which could be time-varying fading parameters or an original 
image that we wish to enhance. For example, a stage actor with face 
S uses makeup X to communicate to the back row audience Y. Here 
X is used to enhance and exaggerate S rather than to communicate 
new information. Another motivation comes from cognitive radio 
systems 1121 . 1221 . Isl . 1171 with the additional assumption that the 
secondary user X" communicates its own message and at the same 
time facilitates the transmission of the primary user's signal S". How 
should the transmitter communicate over the channel to "amplify" 
his knowledge of the state information to the receiver? What is 
the optimal tradeoff between state amplification and independent 
information transmission? 

To answer these questions, we study the communication problem 
depicted in Figure [T] Here the sender has access to the channel 
state sequence 5" — (Si, S2, ■ ■ ■ , S„), independent and identically 
distributed (i.i.d.) according to p(s), and wishes to transmit a message 
index W G [2"-^] — {1,2,..., 2"^}, independent of S", as well 
as to help the receiver reduce the uncertainty about the chaimel state 
in n uses of a state dependent channel (X x S,p{y\x,s),y). Based 
on the message W and the channel state S", the sender chooses 
X"(W, S") and transmits it across the channel. Upon observing the 
channel output Y", the receiver guesses W G [2"^] and forms a 
list Ln{Y"') C iS" that contains likely candidates of the actual state 
sequence 5". 

Without any observation Y", the receiver would know only that 
the channel state 5"* is one of 2"^'^' typical sequences (with almost 
certainty) and we can say the uncertainty about S" is H{S"). Now 
upon observing Y" and forming a list L„{Y'^) of likely candidates 
for 5", the receiver's list size is reduced from nH{S) to log|L,i|. 
Thus we define the channel state uncertainty reduction rate to be 

A = i {H{S") - log |L„|) = H{S) - - log \Ln\ 
n n 

as a natural measure for the amount of information the receiver learns 
about the channel state. In other words, the uncertainty reduction rate 
A £ [0,1/(5')] captures the difference between the original channel 
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W e [2"-"] — X"(W, S") — p(y\x, s) 



► (w(y"),L„(y")) 

Pr(W 5^ 
Pr(S" ^ Lr^iV-)) 



Fig. 1. Pure information transmission versus state uncertainty reduction. 



state uncertainty and the residual state uncertainty after observing 
the channel output. Later in Section [III] we will draw a connection 
between the list size reduction and the conventional information 
measure /(S*"; Y^) that also captures the amount of information 
learns about S". 

More formally, we define a (2"^, 2"^, n) code as the encoder map 



and decoder maps 



with list size 



X" : [2"-^] X 5" ^ X" 



The probability of a message decoding error Pi"' and the probability 
of a list decoding error Pi,"' are defined respectively as 

W — l 

PtJ =Pt{S" ^U{Y")) 

where the message index W is chosen uniformly over [2"^] and 
the state sequence S" is drawn i.i.d. ~ p(s), independent of W. A 
pair {R, A) is said to be achievable if there exists a sequence of 
(2"-^,2"^,n) codes with P^"2 and Pi"' ^ as n -> 00. 
Finally, we define the optimal {R, A) tradeoff region, or the tradeoff 
region in short, to be the closure of all achievable {R, A) pairs, and 
denote it by TZ* . 

This paper shows that the tradeoff region TZ* can be characterized 
as the union of all (P, A) pairs satisfying 

R< I{U;Y) - I{U;S) 
A < H(S) 
R + A< I{X,S; Y) 

for some joint distribution of the form p{s)p{u, x\s)p{y\x , s). 

As a special case, if the encoder's sole goal is to "amplify" the 
state information (P = 0), then the maximum uncertainty reduction 
rate 

A* — sup{A : (P, A) is achievable for some P > 0} 



is given by 



A* = min{P'(5'), max I{X, S; Y)}. 

p{x \s) 



(2) 



The maximum uncertainty reduction rate A* is achieved by designing 
the signal A" to enhance the receiver's estimation of the state 5" 
while using the remaining pure information bearing freedom in A" 
to provide more information about the state. More specifically, there 
are three different components involved in reducing the receiver's 
uncertainty about the state: 

1) The transmitter uses the channel capacity to convey the state 
information. In Section [ill we study the classical setup 1191 , 
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1151 of coding for memory with defective cells (Example [TJ and 
show that this "source-channel separation" scheme is optimal 
when the memory defects are symmetric. 

2) The transmitter gets out of the way of the receiver's view of the 
state. For instance, the maximum uncertainty reduction for the 
binary multiplying channel Y — X-S (Example|2]in SectionUlt 
with binary input X G {0, 1} and binary state S G {0, 1} is 
achieved by sending X = 1. 

3) The transmitter actively amplifies the state. In Example [5] in 
Section Unl we consider the Gaussian chaimel Y = X + S + Z 
with Gaussian state S and Gaussian noise Z. Here the optimal 
transmitter amplifies the state as X = aS under the given 
power constraint EX^ < P. 

It is interesting to note that the maximum uncertainty reduction rate 
A* is the information rate I{X, S; Y) that could be achieved if both 
the state 5* and the signal X could be freely designed, instead of the 
state S being generated by nature. This rate also appears in the sum 
rate of the capacity region expression for the cooperative multiple 
access channel (7) Problem 15.1] and the multiple access channel 
with cribbing encoders by Willems and van der Meulen 1321 . 

When the state information is only causally available at the 
transmitter, that is, when the channel input Xi depends on only the 
past and the current channel channel state S*, we will show that the 
tradeoff region TZ* is given as the union of all (R, A) pairs satisfying 

R<I{U;Y) 
A < H{S) 
R + A< IiX,S; Y) 

over all joint distributions of the form p{s)p{u)p{x\u, s)p(y\x, s). 
Interestingly, the maximum uncertainty reduction rate A* stays the 
same as in the noncausal case lO. That causality incurs no cost on 
the (sum) rate is again reminiscent of the multiple access channel 
with cribbing encoders 1321 . 

The problem of communication over state-dependent channels with 
state information known at the sender has attracted a great deal of 
attention. This research area was first pioneered by Shannon 1271 , 
Kuznetsov and Tsybakov 1191 , and Gelfand and Pinsker 1131 . Several 
advancements in both theory and practice have been made over the 
years. For instance, Heegard and El Gamal 1151 . 1141 characterized 
the channel capacity and devised practical coding techniques for 
computer memory with defective cells. Costa (5) studied the now 
famous "writing on dirty paper" problem and showed that the capacity 
of an additive white Gaussian noise channel is not affected by 
additional interference, as long as the entire interference sequence 
is available at the sender prior to the transmission. This fascinat- 
ing result has been further extended with strong motivations from 
applications in digital watermarking (see, for example, Moulin and 
O'Sullivan (24), Chen and Wornell fH, and Cohen and Lapidoth [4]) 
and multi-antenna broadcast channels (see, for example, Caire and 
Shamai (2), Weingarten, Steinberg, and Shamai 1311 . and Mohseni 
and Cioffi 1231 ). Readers are referred to Caire and Shamai |T|, 
Lapidoth and Narayan (20), and Jafar fT6l for more complete reviews 
on the theoretical development of the field. On the practical side, 
Erez, Shamai, and Zamir 1101 , 1341 proposed efficient coding schemes 
based on lattice strategies for binning. More recently, Erez and ten 
Brink 1111 report efficient coding techniques that almost achieve the 
capacity of Costa's dirty paper channel. 

In 1291 . 1301 , we formulated the problem of simultaneously trans- 
mitting pure information and helping the receiver estimate the channel 
state under a distortion measure. Although the characterization of the 
optimal rate-distortion tradeoff is still open in general (cf. |28|), a 
complete solution is given for the Gaussian case (the writing on dirty 



paper channel) under quadratic distortion 1291 . In this particular case, 
optimality was shown for a simple power-sharing scheme between 
pure information transmission via Costa's original coding scheme 
and state amplification via simple scaling. 

Recently, Merhav and Shamai 1211 considered a related problem 
of transmitting pure information, but this time under the additional 
requirement of minimizing the amount of information the receiver 
can learn about the channel state. In this interesting work, the 
optimal tradeoff between pure information rate R and the amount 
of state information E is characterized for both causal and noncausal 
setups. Furthermore, for the Gaussian noncausal case (writing on dirty 
paper), the optimal rate-distortion tradeoff is given under quadratic 
distortion. (This may well be called "writing dirty on paper".) 

The current paper thus complements 1211 in a dual manner. It is 
refreshing to note that our notion of uncertainty reduction rate A 
is essentially equivalent to Merhav and Shamai's notion of E; both 
notions capture the normalized mutual information /(S*"; Y"). (See 
the discussion in Section [Hi]) The crucial difference is that A is to 
be maximized while E is to be minimized. Both problems admit 
single-letter optimal solutions. 

The rest of this paper is organized as follows. In the next section, 
we establish the optimal (R, A) tradeoff region for the case in which 
the state information S*" is noncausally available at the transmitter 
before the actual communication. Section [Hi] extends the notion of 
state uncertainty reduction to continuous alphabets, by identifying the 
list decoding requirement S" G I/„(K") with the mutual information 
rate —I(S";Y'^). In particular, we characterize the optimal {R,A) 
tradeoff region for Costa's "writing on dirty paper" channel. Since the 
intuition gained from the study of the noncausal setup carries over 
when the transmitter has causal knowledge of the state sequence, 
the causal case is treated only briefly in Section IIVI followed by 
concluding remarks in Section Fvl 

II. Optimal {R, A) Tradeoff: Noncausal Case 

In this section, we characterize the optimal tradeoff region between 
the pure information rate R and the state uncertainty reduction rate 
A with state information noncausally available at the transmitter, as 
formulated in Section |l] 

Theorem 1: The tradeoff region TZ* for a state-dependent channel 
{X X S,p{y\x,s),y) with state information 5" noncausally known 
at the transmitter is the union of all {R, A) pairs satisfying 

R<I{U;Y)-IiU;S) (3) 
A < H{S) (4) 
R + A< I{X,S;Y) (5) 

for some joint distribution of the form p{s)p{u, x\s)p{y\x, s), where 
the auxiliary random variable U has cardinality bounded by \U\ < 

\X\ ■ \S\. 

As will be clear from the proof of the converse, the region given by 
l[3l(-([5ll is convex. (We can merge the time-sharing random variable 
into U.) Since the auxiliary random variable U affects the first 
inequality ([3} only, the cardinality bound on U follows directly 
from the usual technique; see Gelfand and Pinsker 1131 or a general 
treatment by Salehi 1261 . Finally, we can take X as a deterministic 
function of {U, S) without reducing the region, but at the cost of 
increasing the cardinality bound of U ; refer to the proof of Lemma |2] 
below. 

It is easy to see that we can recover the Gelfand-Pinsker capacity 
formula 

C = max{7? : {R, A) £ 7^* for some A > 0} 
= max {I{U;Y) - I{U;S)). 

p{x,u\s) 
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For the other extreme case of pure state amplification, we have the 
following result. 

Corollary 1: Under the condition of Theorem [T] the maximum 
uncertainty reduction rate A* — max{ A : {R, A) G TZ* for some 
i? > 0} is given by 

A* =mm{H{S), max S; F)}. (6) 

Thus the receiver can learn about the state S" essentially at the 
maximal cut-set rate I{X, S; Y). 

Before we prove Theorem [T] we need the following two lemmas. 
The first one extends Fano's inequality (7] Lemma 7.9.1] to list 
decoding. 

Lemma 1: For a sequence of list decoders L„ : y" — > 2*^ , 
Y" ^ LniY") with list size |L„| fixed for each n, let PtJ = 
Pr{S" ^ I/„(y")) be the sequence of corresponding probabilities 
of list decoding error. If Pi"J ^ 0, then 

H{S"\Y") <\og\L„\+n€r, 

where en ^ as n ^ cxj. 

Proof: Define an error random variable E as 

Jo, if G L„, 
\ 1, if S" ^ L„. 

We can then expand 

H{E, S"\Y") = H{S"\Y") + H{E\Y", S") 
= H{E\Y") + H{S"\Y",E). 

Note that H{E\Y") < 1 and H{E\Y",S") = 0. We can also bound 

H{S"\Y'',E) as 

HiS^'lE, Y") = H{S"\Y",E = 0)Pr(£ = 0) 
+ H{S''\Y"',E = l)Pr(£ = 1) 

<\og\L,,\{l-PtJ)+nlog\S\PtJ 

where the inequality follows because when there is no error, the 
remaining uncertainty is at most log \ Ln\, and when there is an error, 
the uncertainty is at most nlog \ S\. This implies that 

H{S"\Y") < l + log|L„|(l-Pi:;)+nlog|5|Pi,"; 

= log \Ln\ + 1 + (nlog \S\ - log \Ln\)PtJ. 

Taking e„ = ;^ + (log |5| — ^ log |I/„|)Pi"' proves the desired result. 

■ 

The second lemma is crucial to the proof of Theorem [T] and 
contains a more interesting technique than Lemma [T] This lemma 
shows that the third inequality l|5} can be replaced by a tighter 
inequality below (recall that I{U,S;Y) < I{X,S;Y) since 
U {X, S) — > Y), which becomes crucial for the achievability 
proof of Theorem [T] 

Lemma 2: Let TZ be the union of all (R, A) pairs satisfying (O- 
Let TZo be the closure of the union of all {R, A) pairs satisfying 

R< I{U;Y) - I{U;S) (H 
A < H{S) & 
R + A< I{U,S;Y) (7) 

for some joint distribution p{s)p{x, u\s)p{y\x, s), where the auxiliary 
random variable U has finite cardinality. Then 

7^ = 7^o. 



Proof: Since U —> {X, S) ^ Y forms a Markov chain, it is 
trivial to check that 

7^o C TZ. (8) 

For the other direction of inclusion, we need some notation. Let 
P be the set of all distributions of the form p(s)p{x,u\s)p{y\x, s) 
consistent with the given p(s) and p{y\x,s), where the auxiliary 
random variable U is defined on an arbitrary finite set. Further let 
be the restriction of P such that X — f{U, S) for some function 
/, i.e., p{x\u, s) takes values or 1 only. 

If we define TZi to denote the closure of all (R, A) pairs satisfying 
lO, iQ, and ((TJ over P', or equivalently, if TZi is defined to be the 
restriction of TZo over a smaller set of distributions P', then clearly 

TZi C TZo. (9) 

Let TZ2 be defined as the closure of (R, A) pairs satisfying ([3l(-([5ll. 
Since X — » {U, S) — » y forms a Markov chain on P', we have 

7^2C7^l. (10) 

To complete the proof, it now suffices to show that 

7^C7^2. (11) 

To see this, we restrict 72-2 to the distributions of the form U — {V, U) 
with V independent of {U,S), namely, 

p{x,u\s) — p{x,v,u\s) — p{v)p{'u.\s)p{x\v,u,s) (12) 

with deterministic p{x\v,u, s), i.e., x is a function of {v,u,s), 
and call this restriction TZs. Since X is a deterministic function of 
(V", U, S) and at the same time {V, U) {X, S) ^ Y form a 
Markov chain, TZ3 can be written as the closure of all (R, A) pairs 
satisfying 

R < I{V, U; Y) - I{V, U; S) 
A < H{S) 
R + A< I{V, U, S; Y) = I{X, S; Y) 

for some distribution of the form p{s)p{x, v, u\s)p{y\x, s) satisfying 
(Ell. But we have 

I{V, U; Y) - I{V, U; S) > I{U; Y) - I{V, U; S) 
= I{U; Y)-I{U;S) 

and the set of conditional distributions on {U,X) given S satisfying 
l lI2t is as rich as any p{u,x\s). (Indeed, any conditional distribution 
p{a\b) can be represented as '^^p{c)p{a\b, c) for appropriately 
chosen p(c) and deterministic distribution p{a\b,c) with cardinality 
of C upper bounded by {\A\ - 1)\B\ + 1; see also (H Eq. (44)].) 
Therefore, we have 

7^ C 7^3 C TZ2 (13) 

which completes the proof. ■ 
Now we are ready to prove Theorem [T] 

Proof of Theorem [7} For the proof of achievability, in the light 
of Lemma [21 it suffices to prove that any pair [R, A) satisfying {Sj, 
HI), (O for some p{u,x\s) is achievable. Since the coding technique 
is quite standard, we only sketch the proof here. For fixed p{u, x\s), 
the result of Gelfand-Pinsker 1131 shows that the transmitter can send 
I{U; Y) — I{U ; S) bits reliably across the channel. Now we allocate 
< i? < I{U;Y) — I{U;S) bits for sending the pure information 
and use the remaining T = I{U; Y) — I{U ;S) — R bits for sending 
the state information by random binning. More specifically, we assign 
typical sequences to 2"^ bins at random and send the bin index of 
the observed S" using nF bits. At the receiving end, the receiver is 
able to decode the codeword U" from Y" with high probability. 
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Using joint typicality of {Y" ,11" , S"), the state uncertainty can 
be first reduced from H{S) to H{S\Y,U). Indeed, the number of 
typical S" sequences jointly typical with {Y",U") is bounded by 
2n(H(S|v,c/)+E) addition, using T = I{U; Y) - I{U; S) - R bits 
of independent refinement information from the hash index of S", 
we can further reduce the state uncertainty by F. Hence, by taking 
the list of all S" sequences jointly typical with {Y", U") satisfying 
the hash check, we have the total state uncertainty reduction rate 

A = I{U,Y;S)+r 

= /([/, Y- S) + I{U; Y) - I{U; S) - R 
= I{U,S;Y) - R. 

By varying < 7? < I{U ; Y) — I{U; S), it can be readily seen that 
all {R, A) pairs satisfying 

R< I{U;Y) ~ I{U;S) 
A < H{S) 
R+A< IiU,S;Y) 

for any fixed p{x,u\s) are achievable. 

For the proof of converse, we have to show that given any sequence 
of (2"^, 2"^^, n) codes with Pi,», Pi,"^ ^ 0, the {R, A) pairs must 
satisfy 

R< I{U;Y) - I{U;S) 
A < H{S) 
R + A< IiX,S; Y) 

for some joint distribution p{s)p{x,u\s)p{y\x,s). 

The pure information rate R can be readily bounded from the 
previous work by Gelfand and Pinsker 1131 Proposition 3]. Here 
we repeat a simpler proof given in Heegard 1141 Appendix 2] 
for completeness; see also |§] Lecture 13]. Starting with Fano's 
inequality, we have the following chain of inequalities: 

nR < I{W;Y")+ne„ 

n 

= Y,I{W;Yi\Y'-^)+ner, 

n 

Ti n 

= I{W, Y^-\ 5r+i; y) - ^(^- ^'"') + 

n n 

^ J2 HW, Y'-\ sr+i; y) - Y HY'-';S,\W, S^+i) + ne„ 



i = l 



(b) ■ 



' Y nw, Y^-\ st,,^n) - Y ^(^' ^0 + 

where (a) follows from the Csiszar sum formula 

n n n 

J2HY^■,Sr+,\W,Y'~')^J2 E IiYv,S,\W,S;+„Y'-') 

i— 1 i — l J— 1+1 

n j-1 

j=i i=i 

n 

and (b) follows because (WjSii^i) is independent of 5;. By rec- 
ognizing the auxiliary random variable Ui — {W,Y^^^ , Sil^i) and 



noting that Ui (Xi, Si) Yi form a Markov chain, we have 

n 

nR < Y.^I{U,; Y,) - I{U,; S,)) + (14) 

i = l 

On the other hand, since log \Ln\ = n{H{S) — IS), we can trivially 
bound A by Lemma [T] as 

TiA < nH{S) - //(^"lY") + ne'„ 

< nH{S)+ne'^. 

Similarly, we can bound i? + A as 

n{R + A) < J(W; F") + /(S"; Y") + ne" 

< 7(W; y"|S") + /(S*"; F") + ne'^, 
<7(W,S";y") + ne;; 



(b) 



7(X",S";F") +i 



(c) 1 

<-Yl{X,,Sv,Y,) + e':, 
n ^ — ' 

i = l 



(15) 



where (a) follows since W is independent of 5" and conditioning 
reduces entropy, (b) follows from the data processing inequality (both 
directions), and (c) follows from the memorylessness of the channel. 

We now introduce the usual time-sharing random variable Q 
uniform over {1, . . . ,n}, independent of everything else. Then l ll4t 
implies 

R < IiUQ;YQ\Q) - I{Uq;Sq\Q) + e„ 
= I{Uq, Q; Yq) - I{Uq, Q; Sq) + e„. 

On the other hand, l IlSI l implies 

R+A<I{XQ,SQ;YQ\Q) + e': 
<I{XQ,SQ,Q;YQ)+e'; 
= I{XQ,SQ;YQ) + e':, 

where the last equality follows since Q {Xq, Sq) Yq form a 
Markov chain. 

Finally, we recognize U = {Uq,Q),X = Xq, 5" ^ Sq,Y = Yq, 
and note that 5* ~ p{s), Pr(F = y\X = x,S = s) — p{y\x, s), and 
U —> {X, S) Y, which completes the proof of the converse. ■ 

Roughly speaking, the optimal coding scheme is equivalent to 
sending the codeword C/" reliably at the Gelfand-Pinsker rate 
R' — I{U ; Y) — I{U ; S) and reducing the receiver's uncertainty by 
A' = I{S; U, Y) from Y" and the decoded codeword f/". It should 
be noted that (7?', A') has the same form as the achievable region 
for the dual tradeoff problem between pure information rate 7? and 
(minimum) normalized mutual information rate E = j^I{S";Y") 
studied in |21|. But we can reduce the uncertainty about S" further 
by allocating part F of the pure information rate R' to convey 
independent refinement information (hash index of 5"). By varying 
F e [0, 7?'] we can trace the entire tradeoff region {R' - F, A' + F). 

It turns out an alternative coding scheme based on Wyner-Ziv 
source coding with side information 1331 , instead of random binning, 
also achieves the tradeoff region TZ* . To see this, fix any p{u,x\s) 
and p{v\s) satisfying 

F := I{V; S\U, Y) < I{U; Y) - I{U; S) 

and consider the Wyner-Ziv encoding of S" with covering codeword 
1/" and side information ((7", Y") at the decoder. More specifically, 
we can generate 2"^'^'^^ V" codewords and assign them into 2"^ 
bins. As before we use the Gelfand-Pinsker coding to convey a mes- 
sage of rate I{U ; Y) — I{U ; S) reliably over the channel. Since the 
rate F = I{V; S\U, Y) is sufficient to reconstruct V" at the receiver 
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S = S = l S = 2 

P 

Fig. 2. Memory with defective ceils. 



with side information and U", we can allocate the rate F for 
conveying V" and use the remaining rate R — I(U ; Y)~I{U; S) — F 
for extra pure information. Forming a list of S" jointly typical with 
(F", U" , V^) results in the uncertainty reduction rate A given by 

A = r, U, V) 
= 7(5;y,[/)+F 

= I{S; U, Y) + I{U; Y) - I{U; S) - R 
= I{U,S;Y) - R. 

Thus the tradeoff region TV can be achieved via the combination 
of two fundamental results in communication with side information: 
channel coding with side information by Gelfand and Pinsker 1131 and 
rate distortion with side information by Wyner and Ziv 1331 . It is also 
interesting to note that the information about S" can be transmitted 
in a manner completely independent of geometry (random binning) 
or completely dependent on geometry (random covering); refer to |I6| 
for a similar phenomenon in a relay channel problem. 

When y is a function of {X, S), it is optimal to identify U — Y, 
and Theorem [T] simplifies to the following corollary. 

Corollary 2: The tradeoff region TV for a deterministic state- 
dependent channel Y — f{X, S) with state information S" non- 
causally known at the transmitter is the union of all (R, A) pairs 
satisfying 



R < H{Y\S) 
A < H{S) 
R + A< H{Y) 



(16) 
(17) 
(18) 



for some joint distribution of the form p{a)p{x\a)p{y\x , s) . In 
particular, the maximum uncertainty reduction rate is given by 



mm{H{S), maxH(y)}. 

p{x\s) 



(19) 



The next two examples show different flavors of optimal state 
uncertainty reduction. 

Example 1: Consider the problem of conveying information using 
a write-once memory device with stuck-at defective cells II9I . II5I 
as depicted in Figure [21 Here each memory cell has probability p of 
being stuck at 0, probability q of being stuck at 1, and probability r 
of being a good cell, with p + g + r = l. Itis easy to see that the 
channel output y is a simple deterministic function of the channel 
input X and the state S. 

Now it is easy to verify that the tradeoff region TV is given by 



R < rH{a) 
A<H{p,q,r) 
R + A< H{p + ar,q+ {1 ~ a)r) 



(20) 
(21) 
(22) 



where a can be chosen arbitrarily (0 < a < 1). This region is 
achieved by choosing p{x) ~ Bern(a). Without loss of generality, 
we can choose X ~ Bern(Q!) independent of S, because the input 
X affects y only when 5 = 2. 
There are two cases to consider. 















R 



i R 

(a) (p, q, r) = (1/3, 1/3, 1/3) (b) (p, q, r) = (1/2, 1/6, 1/3) 

Fig. 3. The optimal {R, A) tradeoff for memory with defective cells. 



(a) If p = q, then the choice of a* — 1/2 maximizes both 
l l20t and i2H . and hence achieves the entire tradeoff region 
TZ* . The optimal transmitter splits the full channel capacity 
C = rH{a*) = r to send both the pure information and 
the state information. (See Figure [3ja) for the case {p,q,r) = 
(1/3,1/3,1/3).) 

(b) On the other hand, when p q, there is a clear tradeoff in 
our choice of a. For example, consider the case (p, q, r) = 
(1/2, 1/6, 1/3). If the goal is to communicate pure information 
over the channel, we should take a* = 1/2 to maximize the 
number of distinguishable input preparations. This gives the 
channel capacity C = rH{a) = 1/3. If the goal is, however, to 
help the receiver reduce the state uncertainty, we take a* — 0, 
i.e., we transmit a fixed signal X = 0. This way, the transmitter 
can minimize his interference with the receiver's view of the 
state S. The entire tradeoff region is given in Figure [S^b). 

Example 2: Consider the binary multiplying channel Y = X ■ S, 
where the output Y is the product of the input X £ {0, 1} and the 
state S £ {0, 1}. We assume that the state sequence S" is drawn 
i.i.d. according to Bem(7). It can be easily shown that the optimal 
tradeoff region is given by 



R < -fH{a) 
A < H{-() 
R + A< H{a^). 



(23) 
(24) 
(25) 



This is achieved by p{x) ~ Bern(a), independent of S. 

As in Example [Hb), there is a tension between the pure informa- 
tion transmission and the state amplification. When the goal is to 
maximize the pure information rate, we should choose q* = 1/2 to 
achieve the capacity C = 7. But when the goal is to maximize the 
state uncertainty reduction rate, we should choose a* = 1 {X = 1) 
to achieve A* = H{'y). In words, to maximize the state uncertainty 
reduction rate, the transmitter simply clears the receiver's view of the 
state. 

III. Extension to Continuous State Space 

The previous section characterized the tradeoff region TV between 
the pure information rate R and the state uncertainty reduction rate 
A = H(S) — ^ log |L„(y")j. Apparently the notion of uncertainty 
reduction rate A is meaningful only when the channel state S has 
finite cardinality (i.e., |5| < 00), or at least when H{S) < 00. 

However, from the proof of Theorem [T] (the generalized Fano's 
inequality in Lemma [TJ, along with the fact that the optimal region 
is single-letterizable, we can take an alternative look at the notion of 



nH(S) 



to 



State uncertainty reduction as reducing the list size from 2 
|L„(y")|. We will show shortly in Proposition [T] that the difference 
A — H{S) — ilog|Ln| of the normalized list size is essentially 
equivalent to the normalized mutual information A/ — j^I{S"; Y"), 
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S"~N(0.,QI) Z"~JV(0,Ar/) 



X"(W,S") 

EXf < nP 



- W(Y") 

Pr{W =^ VVCY")) 
lim i/CS";!"') > Aj 



Fig. 4. Writing on dirty paper. 



which is well-defined for an arbitrary state space S and captures the 
amount of information the receiver can learn about the state S" 
(or lack thereof (2T|). Hence, the physically motivated notion A of 
list size reduction is consistent with the mathematical information 
measure A/, and both notions of state uncertainty reduction can be 
used interchangeably, especially when S is finite. 

To be more precise, we define a (2"^,?i) code by an encoding 
function 

and a decoding function 

Then the associated state uncertainty reduction rate for the (2"^,n) 
code is defined as 

A/ = 

n 

where the mutual information is with respect to the joint distribution 

1=1 

induced by S") with message W distributed uniformly over 

[2"^], independent of 5". Similarly, the probability of error is defined 
as 

Pi"' = vxiyv / vy(y")). 

A pair [R, A) is said to be achievable if there exists a sequence of 
(2"-^,n) codes with Pi"' and 

lim ij(S";y") > A. 

The closure of all achievable (R, A) pairs is called the tradeoff region 
TCj. (Here we use the notation TVi instead of TC to temporarily 
distinguish this from the original problem formulated in terms of the 
list size reduction.) 

We now show that the optimal tradeoff TCi between the information 
transmission rate R and the mutual information rate A has the same 
solution as the optimal tradeoff TV between 7? and the list size 
reduction rate A. 

Proposition 1: The tradeoff region VJ} for a state-dependent chan- 
nel [X X S ,p{jj\x,s),y) with state information noncausally 
known at the transmitter is the closure of all [R, A) pairs satisfying 

R< I{U;Y) - I{U;S) lH 
A < H{S) © 
R + I{X,S;Y) ^ 

for some joint distribution of the form p[s)p{u,x\s)p{jj\x,s) with 
auxiliary random variable U . Hence, 71} has the identical character- 
ization as TZ* in Theorem [T] 

Proof: Let TV* be the region described by ll3}-(|5}. We provide a 
sandwich proof TZ** = TV C TVi C TV* , which is given implicitly 
in the proof of Theorem [T] 



More specifically, consider a finite partitiorQ to quantize the state 
random variable S into [5']. Under this partition, let TZyg^ be the set 
of all {R, A) pairs satisfying 

R<I{U;Y)-I{U; [S]) 
A < H{[S]) 
R + A< I{X, [S];Y) 

for some joint distribution of the form p([s])p(M, 2;| [s])p(y| x, [s]) 
with auxiliary random variable U. Consider the original list size 
reduction problem with state information [5*] and let 7?.[g] denote 
the tradeoff region. Then Theorem [T] shows that TZJg-^ = 7?.*5] . In 
particular, for any e > and {R, A) £ T^.J'gj, there exists a sequence 
of (2"(«-^),2"('^-^',n) codes X"{W),W(Y"), L„{Y") such that 
= Pr{W ^W)^0 and PtJ = Pr([S]" / Ln(y")) ^ 0. 
Now from the generalized Fano's inequality (Lemma [TJ, the 
achievable list size reduction rate A — e should satisfy 

n(A - e) < /([S]"; y") + ne„ < I{S"; Y") + ne„ 

with £„ — > as n — > oo. Hence by letting n ^ oo and e — > 0, we 
have from the definition of TZ} that 

7^[s] = ^[s] c7^^ 

Also it follows trivially from repeating the intermediate steps in the 
converse proof of Theorem [T] that TZ} C TV* . 

Finally taking a sequence of partitions with mesh and hence 
letting 7?.*5j — » TZ**, we have the desired result. ■ 

Since both notions of state uncertainty reduction, the list size reduc- 
tion nH{S) — log |L„| and the mutual information I{S"; F"), lead 
to the same answer, we will subsequently use them interchangeably 
and denote the tradeoff region by the same symbol TZ* . 

Example 3: Consider Costa's writing on dirty paper model de- 
picted in Figure |4] as the canonical example of a continuous state- 
dependent channel. Here the channel output is given by Y" — 
X" + S" + Z", where X"{W, 5*") is the channel input subject to a 
power constraint Y17^i EXf < nP, S" ~ 7V(0, QI) is the additive 
white Gaussian state, and Z" ~ N{0, NI) is the white Gaussian 
noise. We assume that S" and are independent. 

For the writing on dirty paper model, we have the following 
tradeoff between the pure information transmission and the state 
uncertainty reduction. 

Proposition 2: The tradeoff region TV for the Gaussian chan- 
nel depicted in Figure |4] is characterized by the boundary points 

' Recall that the mutual information between arbitrary random variables 
X and Y is defined as I(X;Y) = suppnI{[X]p;[Y]Q), where the 
supremum is over all finite partitions P and Q; see Kolmogorov H8] and 
Pinsker |25l. 
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(i?(7), A(7)), < 7 < 1, where 



^(7) = 2 log 



Hi) = 7; log 



1 + 



v 



'yP + N 



(26) 



(27) 



Proof sketch: The achievabihty follows from Proposition [T] with 
trivial extension to the input power constraint. In particular, we use 
the simple power sharing scheme proposed in 1291 , where a fraction 
7 of the input power is used to transmit the pure information using 
Costa's writing on dirty paper coding technique, while the remaining 
(1 — 7) fraction of the power is used to amplify the state. In other 
words. 



X = V+J{1 



7)5 S 



(28) 



with V ~ N{0, 7P) independent of S, and 

U^V + aS 

with 



'yP + N 



-yP (1-7)P + Q 



Q 



Evaluating 7? = I{U;Y) - I{U;S) and A = I{S;Y) for each 7, 
we recover (US} and llTll. 

The proof of converse is essentially the same as that of 1291 
Theorem 2], which we do not repeat here. ■ 

As an extreme point of the {R, A), we recover Costa's writing on 
dirty paper result 

1 / P 

by taking 7 = 1. On the other hand, if state uncertainty reduction is 
the goal, then all of the power should be used for state amplification. 
The maximum uncertainty reduction rate 



is achieved with X 



0. 



and a - 

In 1291 Theorem 2], the optimal tradeoff was characterized between 
the pure information rate R and the receiver's state estimation enoi 
D = ifillS"" - 5'"(y")||^ Although the notion of state estimation 
error D in 1291 and our notion of the uncertainty reduction rate A 
appear to be distinct objectives at first sight, the optimal solutions to 
both problems are identical, as shown in the proof of Proposition |2] 
There is no surprise here. Because of the quadratic Gaussian nature 
of both problems, minimizing the mean squared error E{S — S{Y))^ 
can be recast into maximizing the mutual information I{S;Y), and 
vice versa. Also the optimal state uncertainty reduction rate A* (or 
equivalently, the minimum state estimation error D* is achieved by 
the symbol-by-symbol amplification Xi = ^JJP/Q) Si. 

Finally, it interesting to compare the optimal coding scheme ( 128b to 
the optimal coding scheme when the goal is to minimize (instead of 
maximizing) the uncertainty reduction I2II . which is essentially based 
on coherent subtraction of X and S with possible randomization. 



IV. Optimal (7?, A) Tradeoff: Causal Case 

The previous two sections considered the case in which the 
transmitter has complete knowledge of the state sequence S" prior to 
the actual communication. In this section, we consider another model 



in which the transmitter learns the state sequence on the fly, i.e., the 
encoding function 



X, : [2" 



1,2,. 



, n, 



depends causally on the state sequence. 
We state our main theorem. 

Theorem 2: The tradeoff region TZ* for a state-dependent channel 
{X X S ,p{^y\x,s),y) with state information S" causally known at 
the transmitter is the union of all (7?, A) pairs satisfying 



R<I{U;Y) 
A < H{S) 
R + A< IiX,S; Y) 



(29) 
(30) 
(31) 



for some joint distribution of the form p{s)p{u)p{x\u, s)p{y\x, s), 
where the auxiliary random variable U has cardinality bounded by 

\u\<\x\-\s\. 

As in the noncausal case, the region is convex. Since the auxiliary 
random variable U affects the first inequality J29b only, the cardinality 
bound \U\ < \X\ ■ \S\ follows again from the standard argument. (A 
looser bound can be given by counting the number of functions / : 
S ^ X; see Shannon 1271 .) Finally, we can take X as a deterministic 
function of ([/, 5") without decreasing the region. 

Compared to the noncausal tradeoff region TZ'^^ in Theorem [T] the 
causal tradeoff region TZ* in Theorem |2] is smaller in general. More 
precisely, TZ* is characterized by the same set of inequalities l[3ll-([5} 
as in TZ^c' but the set of joint distributions is restricted to those with 
auxiliary variable U independent of 5*. Indeed, from the independence 
between U and S, we can rewrite l|29b as 



R < I{U; Y) = I{U; Y) - I{U; S) 



which is exactly the same as ([3}. Thus the inability to use the 
future state sequence decreases the tradeoff region. However, only 
the inequality l |29t . or equivalently, the inequality ([Sj, is affected by 
the causality, and the sum rate l l3It does not change from lO. 

Since the proof of Theorem |2] is essentially identical to that of 
Theorem [T] we skip most of the steps. The least straightforward part 
is the following lemma. 

Lemma 3: Let TZ be the union of all [R, A) pairs satisfying ll29b- 
l |31b . Let TZo be the closure of the union of all [R, A) pairs satisfying 
idUl, Oil, and 

i? + A < /([/, S; Y) (32) 

for some joint distribution p{s)p{u)p{x\u, s)p{y\x, s) where the 
auxiliary random variable U has finite cardinality. Then 

7^ = TZo. 

Proof sketch: The proof is a verbatim copy of the proof of 
Lemma[2l except that here U is independent of 5", i.e., p{x,u\s) = 
p{u)p{x\u, s). The final step JI3b follows since the set of conditional 
distributions on X,U = {V,U) given 5* of the form 

p{x,u\s) = p{v)p{'u,)p{x\i},u, s) (12') 

with deterministic p{x\v, u, s) is as rich as any p{u)p{x\u, s), and 

I{V,U;Y)>I{U;Y). (13') 

With this replacement, the desired proof follows along the same lines 
as the proof of Lemma |2l ■ 
As one extreme point of the tradeoff region TZ* , we recover 
the Shannon capacity formula 1271 for channels with causal side 
information at the transmitter as follows; 



C= max I{U;Y). 

p{u)p{x\u,s) 



(33) 



g 



On the other hand, the maximum uncertainty reduction rate A* for 
pure state amphfication is identical to that for the noncausal case 
given in Corollary [T] 

Corollary 3: Under the condition of Theorem |2l the maximum 
uncertainty reduction rate A* is given by 

A* =min{H{S), max J(X,S';y)}. (34) 

Thus the receiver can learn about the state essentially at the maximum 
cut-set rate, even under the causality constraint. For example, the 
symbol-by-symbol amplification strategy X — \J^S is optimal for 
the Gaussian channel (Example |3} for both causal and noncausal 
cases. 

Finally, we compare the tradeoff regions TZt and with a 
communication problem that has a totally different motivation, yet 
has a similar capacity expression. In ll32l Situations 3 and 4], Willems 
and van der Meulen studied the multiple access channel with crib- 
bing encoders. In this communication problem, the multiple access 
channel {X x S ,p{y\x, s),y) has two inputs and one output. The 
primary transmitter S and the secondary transmitter X wish to send 
independent messages Wa G [2"^] and Wx G [2"^] respectively to 
the common receiver Y . The difference from the classical multiple 
access channel is that either the secondary transmitter X learns the 
primary transmitter's signal S on the fly (Xi{Wx, 5") 1321 Situation 
3]) or X knows the entire signal S" ahead of time {Xi{Wx,S") 
1321 Situation 4]). The capacity region C for both cases is given by 
all {R, A) pairs satisfying 

R<I{X-Y\S) (35) 
A < H{S) (36) 
R + A< I{X,S;Y) (37) 

for some joint distribution p{x, s)p{y\x, s). 

This capacity region C looks almost identical to the tradeoff regions 
TZ^c and TZt in Theorems [T] and [2] except for the first inequality 
J35b . Moreover, l l35b has the same form as the capacity expression 
for channels with state information available at both the encoder 
and decoder, either causally or noncausally. (The causality has no 
cost when both the transmitter and the receiver share the same side 
information; see, for example, Caire and Shamai (T] Proposition 1].) 

It should be stressed, however, that the problem of cribbing multi- 
ple access channels and our state uncertainty reduction problem have 
a fundamentally different nature. The former deals with encoding and 
decoding of the signal S", while the latter deals with uncertainty 
reduction in an uncoded sequence S" specified by nature. In a sense, 
the cribbing multiple access channel is a detection problem, while 
the state uncertainty reduction is an estimation problem. 

V. Concluding Remarks 

Because the channel is state dependent, the receiver is able to 
learn something about the channel state from directly observing 
the channel output. Thus, to help the receiver narrow down the 
uncertainty about the channel state at the highest rate possible, the 
sender must jointly optimize between facilitating state estimation 
and transmitting refinement information, rather than merely using 
the channel capacity to send the state description. In particular, the 
transmitter should summarize the state information in such a way 
that the summary information results in the maximum uncertainty 
reduction when coupled with the receiver's initial estimate of the 
state. More generally, by taking away some resources used to help 
the receiver reduce the state uncertainty, the transmitter can send 
additional pure information to the receiver and trace the entire {R, A) 
tradeoff region. 



There are three surprises here. First, the receiver can learn about 
the channel state and the independent message at a maximum cut-set 
rate I{X, S; Y) over all joint distributions p{x, s) consistent with the 
given state distribution p(s). Second, to help the receiver reduce the 
uncertainty in the initial estimate of the state (namely, to increase the 
mutual information from I{S; Y) to I{X, S; Y)), the transmitter can 
allocate the achievable information rate I{U ; Y) — I{U; S) in two 
alternative methods — random binning and its dual, random covering. 
Thirdly, as far as the sum rate _R + A and the maximum uncertainty 
reduction rate A* are concerned, there is no cost associated with 
restricting the encoder to learn the state sequence on the fly. 
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